Methods and systems for functional analysis of an integrated circuit

ABSTRACT

An apparatus for monitoring operation of a design under test (DUT) comprises a plurality of inputs comprising: an incoming clock edge input connected to detect active clock edges provided to a monitored clock gate; an outgoing clock edge input connected to detect active clock edges sent from the monitored clock gate; an enable input connected to detect enable signals provided to the monitored clock gate and any leaf clock gates connected to receive clock edges through the monitored clock gate; and a protocol input connected to receive protocol signals specifying when the monitored clock gate is required to output active clock edges. The apparatus also comprises a memory in communication with the inputs for storing values from the inputs, and a processor in communication with the memory and the inputs, the processor programmed to determine protocol compliance and to calculate energy consequences of dropping of active clock edges.

CROSS-REFERENCE

This application is a continuation of U.S. application Ser. No.14/831,505, filed on Aug. 20, 2015, which are hereby incorporated byreference.

FIELD

The present disclosure relates to analysis of integrated circuitdesigns.

BACKGROUND

The design of an integrated circuit typically includes, among otheraspects, functional verification and power analysis. Functionalverification refers to a practice of testing the circuit and analyzingthe results of the test to determine whether the circuit is performingto specification. For example, given a set of inputs, does the circuitgenerate the expected output? Functional verification can be executedwith a relatively large degree of automation to cover all of the variousoperation conditions of the circuit. Briefly, functional verificationensures that the logical design of the circuit is correct.

In contrast, power analysis is an aspect of circuit design that isdirected to the physical requirements of the design specification.Therefore, power analysis is generally performed separately fromfunctional verification, and the tools for power analysis are differentfrom the tools for functional verification.

Conventional power analysis can report power consumption of each celland activity in each net of a design, given a design and netlistactivity file. However, these power reports do not indicate whether thepower consumption of a cell is correlated to the functional workload ofthe cell. In practice, a cell may be consuming power but not producinguseful work. In this case, conventional power analysis would notindicate whether power consumption could be reduced.

It is desirable to obviate or mitigate these shortcomings ofconventional power analysis.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present disclosure will now be described, by way ofexample only, with reference to the attached Figures.

FIG. 1 is an example of a clock-gated circuit for connecting to a clockgate monitor according to an embodiment of the present disclosure.

FIG. 2 schematically illustrates an example clock gate monitor forperforming functional power analysis according to an embodiment of thepresent disclosure.

FIG. 3 is an example of a clock-gated circuit for connecting to a clockgate monitor according to an embodiment of the present disclosure.

FIG. 4 is an example of a clock-gated circuit for connecting to a clockgate monitor according to an embodiment of the present disclosure.

FIG. 5 schematically illustrates another example clock gate monitor forperforming functional power analysis according to an embodiment of thepresent disclosure.

FIG. 6 is an example of a clock-gated circuit for connecting to a clockgate monitor according to an embodiment of the present disclosure.

FIG. 7 is an example of a clock-gated circuit for connecting to a clockgate monitor according to an embodiment of the present disclosure.

FIG. 8 is a flowchart illustrating an example method of processinginputs received at a clock gate monitor according to an embodiment ofthe present disclosure.

DETAILED DESCRIPTION

Generally, the present disclosure provides methods and systems forverifying a clock-gated integrated circuit using tools that perform bothfunctional verification and power analysis on cells of the clock-gatedintegrated circuit.

An example tool examines the power consumption of cells under a set offunctional workloads. Examining power consumption across a set ofworkloads enables prediction of power consumption under related, butunmeasured, workloads. Thus, by correlating the functional workload tothe power consumption, the power analysis performed by this tool may beconsidered a functional power analysis. The tool can further determine,from the correlation between the functional workload and the powerconsumption of a cell, whether the power consumption of the cell, or setof cells, may be reduced.

One aspect of the present disclosure provides an apparatus formonitoring operation of a design under test (DUT) comprising a pluralityof combinational logic elements, a plurality of clocked sequential logicelements, and a plurality of clock gate elements connected toselectively provide clock edges to the clocked sequential logicelements. The apparatus comprises a plurality of inputs comprising: anincoming clock edge input connected to detect active clock edgesprovided to a monitored clock gate; an outgoing clock edge inputconnected to detect active clock edges sent from the monitored clockgate; an enable input connected to detect enable signals provided to themonitored clock gate and any leaf clock gates connected to receive clockedges through the monitored clock gate; and a protocol input connectedto receive protocol signals specifying when the monitored clock gate isrequired to output active clock edges. The apparatus also comprises amemory in communication with the plurality of inputs for storing valuesfrom the plurality of inputs, and a processor in communication with thememory and the plurality of inputs, the processor programmed todetermine protocol compliance and to calculate energy consequences ofdropping of active clock edges at the monitored clock gate.

One aspect of the present disclosure provides a method for monitoringoperation of a DUT. The method comprises detecting active clock edgesprovided to a monitored clock gate, detecting active clock edges sentfrom the monitored clock gate, detecting enable signals provided to themonitored clock gate and any leaf clock gates connected to receive clockedges through the monitored clock gate, receiving protocol signalsspecifying when the monitored clock gate is required to output activeclock edges, determining protocol compliance by comparing the activeclock edges sent from the monitored clock gate to a set of requirededges specified by the protocol signals; and calculating energyconsequences of dropping of active clock edges at the monitored clockgate by comparing the active clock edges provided to a monitored clockgate with the active clock edges sent from the monitored clock gate.

Other aspects and features of the present disclosure will becomeapparent to those ordinarily skilled in the art upon review of thefollowing description of specific embodiments in conjunction with theaccompanying figures.

FIG. 1 is an example of a clock-gated circuit which can be used todemonstrate the functional power analysis operations of variousembodiments of the present disclosure. The circuit 100 comprises a firstclock gate 101 connected to a second clock gate 102 and a third clockgate 103. In the illustrated example, clock signals must pass throughthe first clock gate 101 before reaching the second or third clock gate102 or 103, and as such the first clock gate 101 may be referred to as a“root”, and the second and third clock gates 102 and 103 may each bereferred to as a “leaf”, of a clock gate “tree”.

The first clock gate 101 is connected to each enable input of a firstflip-flop 111 and a second flip-flop 112. The second clock gate 102 isconnected to each enable input of a third flip-flop 113 and a fourthflip-flop 114. The data inputs of the first and second flip-flops 111and 112 are triggered by other upstream combinational logic elements(combinational cloud 121) of which the exact nature is unimportant forthe purpose of the present disclosure. The data inputs of the third andfourth flip-flops 113 and 114 are triggered by a second combinationalcloud 122, which is connected to the outputs of the first and secondflip-flops 111 and 112. Thus, the data inputs of the third and fourthflip-flops 113 and 114 are indirectly connected to the outputs of thefirst and second flip-flops 111 and 112.

Clock gating is a technique that selectively disables synchronousflip-flops from switching states, which reduces the power consumption ofthe flip-flops and consequently also power dissipation of combinationalcells driven by these flip flops. If the circuit 100 did not have anyclock gates, the clock inputs of the flip-flops would be triggered by acommon clock, and each flip-flop would switch state on each active clockedge. (As one of skill in the art will appreciate, depending on thedesign of the circuit in question positive or negative edges may beactive clock edges.)

Clock gating selectively passes the clock signal to the clock input ofthe flip-flop. If a certain flip-flop does not need to change states (topass the state of data input to the output) then the clock signal can begated off by the clock gate in order to reduce power consumption in theflip-flop, as well as in the fanout of the flip-flop (e.g., thecombinational logic elements receiving data from the flip-flop).

It is difficult to assess the impact of clock gating in the design underrepresentative operation. From a functional verification perspective,the implementation of the clock gating technique in a design should notdestroy critical information (as defined by relevant protocols for thatdesign) that would otherwise propagate through the design if clock edgeswere provided. The destruction of this information would change therequired functional behavior of the circuit and would in effect be aviolation of relevant protocols (either design specific protocols orindustry protocols) applicable to the design.

From a power analysis perspective, clock gating should not provideadditional clock edges over what is minimally necessary to move criticalinformation through the design. Otherwise, the dynamic power consumedconsequential to delivering these edges is wasted.

Ideally, clock gating should only add minimal complexity to the clocktree—individual clock gates for every flip-flop would not typically saveenough power to justify their insertion. It is an optimization problemto find a set of clock gates and enable logic that saves power byreducing clock edges and discarding propagated information at a smallincremental cost in added clock gate cells and combinational cells thatdefine the enable logic for each clock gate.

FIG. 2 shows an example clock gate monitor 200 for performing functionalpower analysis according to an embodiment of the present disclosure. Theclock gate monitor 200 can be used to help find the optimal clock gatinglogic that is protocol compliant yet power efficient.

In an embodiment, the monitor 200 connects to a design under test ordevice under test (DUT). The DUT comprises a plurality of combinationallogic elements (e.g. combinational clouds 121 and 122 of FIG. 1), aplurality of clocked sequential logic elements (e.g. flip-flops 111,112, 113, and 114 of FIG. 1), and a plurality of clock gate elements(e.g. clock gates 101, 102 and 103 of FIG. 1) connected to selectivelyprovide clock edges to the clocked sequential logic elements.

In a typical implementation, a DUT would be provided with a plurality ofclock gate monitors 200, with one clock gate monitor 200 connected toeach clock gate of the DUT. In some implementations, additional clockgate monitors 200 may be connected to the clock gate input of eachun-clock-gated flip-flop in the DUT (i.e., at locations where additionalclock gates could be added to the DUT), for example in order to assistin evaluation of whether or not to add additional clock gates.

The monitor 200 comprises a plurality of inputs, comprising: an incomingclock edge input 201, an outgoing clock edge input 202, an enable input203, and a protocol input 204. The monitor 200 also comprises a memory205; and a processor 206.

The incoming clock edge input 201 is connected to the DUT to detectactive clock edges provided to a monitored clock gate.

The outgoing clock edge input 202 is connected to the DUT to detectactive clock edges sent from the monitored clock gate.

The enable input 203 is connected to the DUT to detect enable signalsprovided to the monitored clock gate and any leaf clock gate connectedto receive clock edges through the monitored clock gate.

The protocol input 204 is connected to the DUT to receive protocolsignals specifying when the monitored clock gate is required to outputactive clock edges. A protocol signal active edge is preferably providedprior to the delivery time of each required output active clock edge.

In some embodiments, the protocol input 204 comprises two bits and wheneither bit is high a required output active clock edge is indicated.With such a configuration, a continuous series of required output activeclock edges can be represented in the protocol input as {[0,1], [1,0],[0,1], [1,0], . . . }.

The memory 205 is in communication with the plurality of inputs 201-204and stores values from the plurality of inputs.

The processor 206 is in communication with the memory 205 and theplurality of inputs 201-204. The processor 206 is programmed todetermine protocol compliance and to calculate energy consequences ofdropping of active clock edges at a monitored clock gate.

The monitor 200 provides dynamic analysis of a clock gate to allowconfirmation that, cycle to cycle, the clock gate is well coordinatedwith other clock gates and conforms to relevant protocols. Monitor 200provides advantages over conventional static analysis techniques, whichare only based on toggle counts per net over a time interval, not cycleto cycle behavior of the design.

FIG. 3 shows an example of using a monitor (not shown), such as themonitor 200, to determine a protocol violation. For example, theflip-flop 111 may be part of the write address slave AXI interface of adesign. This flip-flop 111 must follow protocol defined for a slave onthe AXI write address channel, governed by AWVALID and AWREADY. Thebehavior of AWVALID and AWREADY define when this sequential cell mustreceive a clock edge to capture the attributes of the AXI write address.The clock gate 102 for the flip-flop 111 inherits the requirements forthe specific flip-flop.

The monitor connected to the clock gate 102 will determine whether theclock gate 102 violates the AXI protocol. In the example shown in FIG.3, an incoming clock edge input (not shown), similar to the incomingclock edge input 201 of the clock gate monitor 200, receives the inputof the clock gate 102 (which is the output of clock gate 101), anoutgoing clock edge input (not shown), similar to the outgoing clockedge input 202 of the clock gate monitor 200, receives the output ofclock gate 102, an enable input (not shown), similar to the enable input203 of the clock gate monitor 200, receives the enable single input toclock gate 102, and a protocol input (not shown), similar to theprotocol input 204 of the clock gate monitor 200, receives the protocolsignal 301.

A processor (not shown), similar to the processor 206 of the clock gatemonitor 200, determines, from the information provided by the outgoingclock edge input and the protocol input, that a protocol violationoccurred. In particular, in the example of FIG. 3, the protocol signal301 indicates that two edges should be output from the clock gate 102,but only one edge is actually output as indicated by the signal aboveprotocol signal 301 in FIG. 3.

FIG. 4 shows an example of using a monitor (not shown), such as theclock gate monitor 200, to determine a protocol spurious clock edge. Forexample, the AXI protocol requires that flip-flops 111 and 112 receivetwo clock edges, as indicated by protocol signal 401. The AXI protocolrequires that flip-flops 113 and 114 receive one clock edge, asindicated by protocol signal 402. Consequently, the monitor candetermine that clock gate 103 is outputting an additional unrequired or“spurious” clock signal that is not correlated to the AXI protocol.Consequently, the flip-flops 113 and 114 are consuming more power thannecessary given protocol.

A processor (not shown), similar to the processor 206 of the clock gatemonitor 200, can also calculate how much energy is saved by dropping aclock edge at the monitored clock gate. The processor calculates thesavings by comparing and incoming clock edge input (not shown), similarto the incoming clock edge input 201 of the clock gate monitor 200, toan outgoing clock edge input (not shown), similar to the outgoing clockedge input 202 of the clock gate monitor 200. For example, when theincoming clock edge input has two edges and the outgoing clock edgeinput has one edge the difference between the incoming clock edge inputand the outgoing clock edge input shows a single clock edge energysavings credited to the monitored clock gate. In the case where themonitored clock gate has one or more leaf clock gates further downstreamin its clock gate tree, the processor can also attribute additionalenergy savings to the monitored clock gate for dropping an edge when anenable input (not shown), similar to the enable input 203 of the clockgate monitor 200, indicates that such leaf clock gates are enabled, suchthat but for dropping of the edge at the monitored clock gate that edgewould have also propagated to the leaf clock gates, and the monitoredclock gate is credited with saving the energy that would have beenconsumed by the leaf clock gates and their fanouts.

FIG. 5 shows a clock gate monitor 500 according to a further embodimentof the present disclosure. The clock gate monitor 500 can be used tohelp find the optimal clock gating logic that is protocol compliant yetpower efficient.

The monitor 500 connects to a design under test or device under test(DUT). The DUT comprises a plurality of combinational logic elements(e.g. combinational clouds 121 and 122), a plurality of clockedsequential logic elements (e.g. flip-flops 111, 112, 113, and 114), anda plurality of clock gate elements (clock gates 101 and 102) connectedto selectively provide clock edges to the clocked sequential logicelements.

In a typical implementation, a DUT would be provided with a plurality ofclock gate monitors 500, with one clock gate monitor 500 connected toeach clock gate of the DUT. In some implementations, additional clockgate monitors 500 may be connected to the clock gate input of sets ofun-clock-gated flip-flop in the DUT (i.e., at locations where additionalclock gates could be added to the DUT), for example in order to assistin evaluation of whether or not to add additional clock gates.

The monitor 500 comprises a plurality of inputs, comprising: an incomingclock edge input 501, an outgoing clock edge input 502, an enable input503, a protocol input 504, a data-in input 507, a data-out input 508, anupstream clocking input 509, a downstream clocking input 510, and a timewindow input 511. The monitor 500 also comprises a memory 505 and aprocessor 506.

The incoming clock edge input 501 is connected to the DUT to detectactive clock edges provided to a monitored clock gate.

The outgoing clock edge input 502 is connected to the DUT to detectactive clock edges sent from the monitored clock gate.

The enable input 503 is connected to the DUT to detect enable signalsprovided to the monitored clock gate and any leaf clock gate connectedto receive clock edges through the monitored clock gate.

The protocol input 504 is connected to the DUT to receive protocolsignals specifying when the monitored clock gate is required to outputactive clock edges. A protocol signal is preferably provided just priorto the time each required for outputting each active clock edge. In someembodiments, the protocol input 504 comprises two bits as describedabove with reference to FIG. 2.

The memory 505 is in communication with the plurality of inputs andstores values from the plurality of inputs.

The processor 506 is in communication with the memory and the pluralityof inputs. The processor is programmed to determine protocol complianceand to calculate power consequences of clock gating.

The data-in input 507 is connected to detect signals on data input pins(D-pins) of sequential logic elements within a fanout of the monitoredclock gate. The fanout of the monitored clock gate comprises all of theclocked sequential elements connected to receive clock signals throughthe monitored clock gate.

The data-out input 508 is connected to detect signals on data outputpins (Q-pins) of sequential logic elements within the fanout of themonitored clock gate.

The upstream clocking input 509 is connected to detect active clockedges output from the clock gates controlling the sequential logicelements upstream from the sequential logic elements controlled by themonitored clock gate.

The downstream clocking input 510 is connected to detect active clockedges output from the clock gates controlling the sequential logicelements downstream from the sequential logic elements controlled by themonitored clock gate.

The time window input 511 receives a time window range instructing theprocessor 506 to perform certain operations for that time window range.The time window range may, for example, be a fixed or adjustable numberof clock cycles. In an embodiment, the processor 506 determines powersaving based on the energy saved due to dropping of active clock edgesat the monitored clock gate for the time window. In another embodiment,the processor 506 determines power saving based on energy saved due todropping of active clock edges at the monitored clock gate, and energysaved in the fanout of the monitored clock gate, for the time window. Inyet another embodiment, the processor 506 determines power savings basedon energy saved due to dropping of active clock edges at the monitoredclock gate, energy saved in the fanout of the monitored clock gate, andalso determines potential additional power savings realizable throughelimination of the unnecessary active clock edges for the time window.In yet another embodiment, the processor 506 determines power savingsbased on energy saved due to dropping of active clock edges at themonitored clock gate, energy saved in the fanout of the monitored clockgate, potential additional power savings realizable through eliminationof the unnecessary active clock edges for the time window. In yetanother embodiment, the processor 506 determines power savings based onenergy saved due to dropping of active clock edges at the monitoredclock gate, energy saved in the fanout of the monitored clock gate,potential additional power savings realizable through elimination of theunnecessary active clock edges for the time window, and also determinesadditional power savings realizable through elimination of unnecessarycombinational activity.

The monitor 500 provides the ability to determine if clock gates forflip-flops upstream and downstream of each other are well coordinated sothat required information propagates with a minimum number of clockedges. Protocol violations and compliance, spurious clock edges, actualenergy/power savings due to clock gating and potential additionalenergy/power savings may be determined by the monitor 500 insubstantially the same manner as described above with respect to themonitor 200 of FIG. 2. Energy consumption in the fanout of the monitoredclock gate may be determined, for example, based on the data-in anddata-out inputs 507 and 508 which indicate the set of sequentialelements in the fanout that change output values in response to a clockedge.

FIG. 6 shows an example of using the monitor 500 to determinecoordination of upstream and downstream clock edges. For example, theclock gate 101 propagates an edge, while the clock gate 102 drops theedge. The clock gate 103, however, propagates the same edge that wasdropped by clock gate 103, but clock gate 103 has a fanout that islocated downstream of the fanout of clock gate 102. Downstream meansthat the fanout of the clock gate 103 (the clocked sequential logicelements connected to receive clock edges through clock gate 103)receives data from the fanout of the clock gate 102.

In this case, the monitor 500 is connected to the clock gate 103 and theprocessor 506 compares the upstream clock input 509 and the outgoingclock edge input 502 and determines that the upstream clock gate 102dropped an edge and no new information will be propagated to the fanoutof the clock gate 103 for that edge. Therefore, the processor willdetermine that the clock gate 103 could have dropped the edge that wasdropped by clock gate 102, such that additional energy/power savingscould be realized.

FIG. 7 shows another example of using the monitor 500 to determinecoordination of upstream and downstream clock edges. For example, theclock gate 101 propagates an edge, while the clock gate 103 drops theedge. The clock gate 102, however, propagates the same edge but has afanout that is located upstream of the fanout of clock gate 103.Upstream means that the fanout of the clock gate 102 (the clockedsequential logic elements connected to clock gate 103) are providingdata to the fanout of the clock gate 103.

In this case, the monitor 500 is connected to the clock gate 102 and theprocessor 506 will compare the downstream clock input 510 and theoutgoing clock edge input 502 and determine that the downstream clockgate 103 dropped an edge and the information propagated by the fanout ofclock gate 102 to the fanout of the clock gate 103 for that edge willsimply be discarded. Therefore, the processor will determine that theclock gate 102 could have dropped the edge that was dropped by clockgate 103, such that additional energy/power savings could be realized.

In some embodiments, one or more clock gate monitors (such as monitor200 or 500 described above) are implemented on a chip, with the inputsimplemented on pins of the chip. The following table describes the pinsof an example clock gate monitor, with reference to corresponding inputsof monitors 200/500 described above where applicable:

TABLE 1 Pin Width Purpose/Notes ECK 1 Provides an ability to attributeall past activity to a net power savings or power loss, and checkcompliance to protocol. (Corresponds to outgoing clock edge input202/502.) D_PINS R bits Together with CK, provides an ability to samplevalues at the determined by inputs of sequential fanout that may notpropagate to the total sequential outputs of the sequential fanoutbecause of clock gating (CK cell input pins. moved but not ECK), therebyattributing the reduction in power to the clock gate behavior.(Corresponds to data-in input 507.) Q_PINS Q bits Provides an ability toobserve activity at the outputs of determined by sequential fanout thatdid stimulate the combinational fanout, total sequential which in turnprovides the ability to attribute all dynamic cell output power in thedesign to the behavior of individual clock gates. pins. Without Q PINSit is not possible to accurately model combinational power consumptionnor power dissipated internal to sequential cells. (Corresponds todata-out input 508.) U_ECK N bits, one bit Provides an ability toobserve whether or not previous to this per upstream ECK active edge atleast 1 upstream clock gate provided a clock gate that clock edge to itsown fanout, thereby propagating information propagates that could becaptured by this clock gate. Lacking at least 1 information to upstreamclock gate ECK active edge, there is no new the sequential informationfor this clock gate ECK to capture. fanout of the (Corresponds toupstream clocking input 509.) monitored clock gate. D_ECK P bits, onebit Provides an ability to observe whether or not after this ECK peractive edge at least 1 downstream clock gate provided a downstream clockedge to its own fanout, thereby capturing information clock gate thatpropagated by this clock gate. Lacking at least 1 propagates downstreamclock gate ECK active edge, the information information propagated bythis clock gate has been discarded and hence from the the ECK edgeproduced wasted power. monitored (Corresponds to downstream clockinginput 510.) clock gate. MUST_ECK 2 Provides an ability to observewhether or not this ECK edge is required by protocol. The count of pastMUST_ECK active edges is either equal to 0 (protocol spurious) 1(protocol required) >1 (some protocol required edges were notdelivered). One of a pair of bits transitions low-high ahead of the CKactive edge to indicate that protocol requires an ECK active edge beforethe end of the current clock cycle. Two bits are used so that an ORindicates times when clocks should be provided by the clock gate. Twobits are used so that back to back clock edges can produce positiveedges on WINDOW near inactive edges of CK, and ahead of the active edgeof CK. (Corresponds to protocol input 204/504.) CK 1 Provides an abilityto count the number of input clock active edges dropped before ECK hasan active edge. For leaf clock gates in particular this ensures they arenot credited with dropping edges that were dropped by root clock gatesbecause ECK didn't toggle because CK input didn't toggle either.(Corresponds to incoming clock edge input 201/501.) E S bits, one bitProvides an ability to credit a root clock gate with power per fanoutleaf savings in leaf clock gates when CK does not propagate to clockgate in ECK because specific leaf clock gates have their enable pin thefanout of active. the monitored (Corresponds to enable input 203/503.)clock gate. WINDOW 1, both edges Typical practice is to define statictime windows in which active. power is computed for the activity in thedesign. In contrast, WINDOW allows the clock gate to track the powerconsequence of each pin toggle as they occur. WINDOW can be arbitrarilyset to the clock period itself or to the duration of a packet passingthrough the design and similar. The WINDOW pin exposes the time-accuracyof the monitor so that a user can see power dissipated in each packetetc. This flexibility means that unlike current practice it is notnecessary for a user to, for example, work backwards from reported powerconsumed between 100 and 120 nanoseconds to specific clock cycles norspecific design specific events. Ultimately the clock gate monitor isstill a digital, event driven, apparatus. It cannot look within a singleclock cycle and determine that there is high power dissipation becausethe clock wave form is very crisp, with a lot of high frequencycomponents. But within limitations of digital event-based modelling theclock gate monitor can be made arbitrarily time accurate. (Correspondsto time window input 511.)

In some embodiments, the clock gate monitor assesses all dropped andpropagated clock edges for power impact. The power impact of a droppedor propagated clock edge is defined as the dynamic power consumed (orsaved) by the clock gate, the sequential fanout of the clock gate, andthe combinational fanout of the sequential fanout of the clock gateconsequential to the propagated (or dropped) clock edge.

In some embodiments, the clock gate monitor also assesses all enabletoggles for power impact. The power impact of an enable toggle may bedefined as the dynamic power dissipated in the clock gate andcombinational cloud that solely provides the clock gate enable signalconsequential to the enable toggle. The word solely here allows a clockgate monitor to, in extreme cases, attribute very small powerconsequence to enable toggles when that enable is generated by acombinational cloud that drives other design cells than the clock gatealone.

By assessing the output clock edges and enable toggles for total powerimpact, a set of clock gate monitors, one per clock gate in the design,is able to assess dynamic power consumption in the entire design on thebasis firstly of enable toggles and output clock edges. In such a fulldesign assessment, ungated sequential cells may be provided with a setof virtual clock gates that pass all clock edges.

To increase the accuracy of the power consequences attributed to theclock gate, in some embodiments the D (input) and Q (output) pins of thesequential fanout of each clock gate are also monitored. Theseadditional inputs allow the specific power consequence within eachsequential cell of propagation of D (input) edges to Q (output) edges tobe assessed, as well as the specific power consequence of Q (output)edges on the combinational fanout.

The total dynamic power impact of dropped or propagated clock edges, andin aggregate the entire design, may be defined as either required orwasted on the basis of: protocol requirements; the previous active clockedges of upstream clock gates; the subsequent active clock edges ofdownstream clock gates; the current enable state of fanout clock gates;and, activity at D (input) and Q (output) pins of fanout sequentialcells.

The power consequence of the clock gate may be assessed as follows:

1. For all clock gates:

a) A missing protocol-required output active clock edge is defined asmissing required power. This indicates that the design does not meetrequirements. A dropped output active clock edge is defined as savingpower otherwise.

b) An additional output active clock edge that is not required byprotocol is defined as producing wasted power.

c) All other output active clock edges are defined as producing requiredpower unless the previous behavior of upstream clock gates dictatesotherwise.

2. For a clock gate with upstream clock gates:

a) An output active clock edge without at least one previous upstreamactive clock edge is defined as wasted power as it is not possible forthis output active clock edge to capture any new information propagatedby an upstream active clock edge.

b) All other output active clock edges are defined as producing requiredpower unless the subsequent behavior of downstream clock gates dictatesotherwise.

3. For a clock gate with downstream clock gates:

a) An output active clock edge without at least one subsequentdownstream active clock edge is defined as wasted power as it is notpossible that any information propagated by this clock gate was capturedby downstream sequential elements.

b) All other output active clock edges are defined as producing requiredpower.

4. For a root clock gate with leaf clock gates that drops an outputclock edge:

a) The power consequence of this dropped output active edge is increasedfor each leaf clock gate with an asserted enable signal. The root clockgate is credited with larger power impact for each leaf clock gate thatwould otherwise have propagated the clock edge.

The power consequence of the clock gate behavior is increased asfollows:

1. For all dropped active clock edges, the power consequence of thedropped active edge is increased by: the set of D input toggles that arenot propagated to Q toggles (power which would have been dissipated bysequential fanout); and, the set of Q toggles that do not excite thecombinational cloud (power which would have been dissipated bycombinational fanout).

2. For all propagated active clock edges, the power consequence of thepropagated active edge is increased by: the set of D input toggles thatare propagated to Q toggles (internal power dissipated by sequentialfanout); and, the set of Q toggle that do excite the combinational cloud(power dissipated by combinational fanout).

In some embodiments, the activity of the monitored clock gate is viewedby the clock gate monitor as a sequence of transactions. Eachtransaction is defined by a single output active edge of the clock gate.All activity of the clock gate and cells grouped with the clock gate isassociated with specific output active edges of the clock gate in amanner that indicates the net positive (negative) impact of the clockgate on device power. A set of transactions may be grouped togetherwithin a window defined by a time window input as discussed above.

In some embodiments, a clock gate monitor maintains statistics for eachof a total window, a previous transaction, and a current transaction.Such statistics may, for example, be determined by incrementing variouscounters for the window, previous transaction and current transaction,as discussed further below. The total window statistics indicate theenergy consumed and saved by the clock gate is accumulated in a periodthat spans all ECK transactions completed within a power window definedby WINDOW pin edges. The previous transaction statistics are maintainedbecause some pin activity is only possible due to an ECK active edge andoccurs after the ECK edge. Consequently, this activity can only berecorded for the previous ECK transaction when the current ECK activeedge is observed. The current transaction statistics may be used totrack pin activity that is not due to an ECK active edge, and as suchmay be recorded for the current ECK transaction.

As an extension to basic functionality, if a clock gate monitor isobserving netlist activity with functional timing and many non-zerotransition delays, then ECK edges may be used to define delays at whichpoint pins are examined for activity.

In some embodiments, a clock gate monitor retains a set of 3 statisticsonly (i.e., window, previous transaction and current transaction),replacing content as events occur. In some embodiments, statistics couldbe forwarded or copied to an external agent or memory prior to replacingcontent.

The following table illustrates examples of how pin activity is mappedto transactions in some embodiments:

TABLE 2 Pin Transaction Notes ECK Definition of Each active edge of ECKdefines a new transaction, Boundaries completes the previoustransaction. D_PINS Current Toggles on D pins are sampled on ECK activeedges and recorded for the current transaction. Some or none of thesetoggles will later propagate to Q pin toggles. Q_PINS Previous Toggleson Q pins occur after ECK active edges and are recorded for the previoustransaction. U_ECK Current Output active edges from upstream clock gatesmust have occurred prior to the output active edge of this clock gateotherwise the current output active edge is spurious - not possiblycapturing new information. D_ECK Previous Active edges of downstreamclock gates must occur after the previous output active edge of thisclock gate and before the current output active edge of this clock gateotherwise the previous output active edge is spurious - information wasdiscarded by downstream clock gates. MUST_ECK Current Prior to thecurrent output active edge, protocol monitors have either provided 1,none, or more than 1 MUST_ECK positive edge indicating that the currentoutput active edge is either required, spurious, or insufficient withregards to protocol. CK Current Additional CK active edges prior to theoutput active edge of the clock gate are claimed as saved energy for thecurrent transaction. E Current Any enable edges prior to the outputactive edge are claimed for the current transaction to provideindication of when the enable pin is toggling too frequently to allowthe clock gate to save any net energy. WINDOW Total Window Edges of thewindow pin define the end of the total window, triggering computation ofenergy to power followed by clearing all statistics for the next window.

FIG. 8 is a flowchart showing an example method 800 carried out by aclock gate monitor according to one embodiment. Throughout method 800,the clock gate monitor monitors the outgoing clock edge input for anactive clock edge at 802. When an outgoing active clock edge is detected(block 802 YES output), the previous transaction statistics are replacedwith the current transaction statistics at 804, and the consequences ofthe previous transaction are resolved at 806. After the previoustransaction has been resolved, counters for the total window areincremented based on the previous transaction statistics at 808.

Incrementing counters for the total window may, for example, involve anyor all of the following counters:

-   -   Protocol_spurious_fJ: incremented if the last ECK edge is        spurious with respect to protocol.    -   Protocol_ck_edges_lost_fJ: incremented if missing protocol        required clock cycles were detected.    -   No_receiver_fJ: incremented if no downstream clock gate accepted        data from last ECK edge.    -   No_transmitter_fJ: incremented if no upstream clock gate sent        data for capture by last ECK edge.    -   Q_edges_saved_fJ: incremented by difference between observed D        edges and Q edges across fanout for last ECK edge using        fJ_PER_Q_EDGE[k] for Q bit k, all k.    -   Combo_edges_saved_fJ: incremented by difference between observed        D edges and Q edges across fanout for last ECK edge.    -   ECK_edges_saved_fJ: incremented across fanout clock gates for        all edges that otherwise would have been delivered to their        fanout.    -   E_fJ: incremented for each E (enable) edge between the last        active ECK edge and the previous active ECK edge.    -   Total_fJ, power consumed by the cluster of cells grouped with        the clock gate: incremented by:        -   Q edges*(fJ_PER_D_TO_Q_EDGE+fJ_PER_Q_EDGE)        -   +fJ_PER_ECK_EDGE[0]        -   +total_leakage_fJ, (leakage is calculated based on time as            known in the art)    -   Net_fJ, negative representing net energy savings:        leakage_fj+E_fj−q_edges_saved_fj        eck_edgs_saved_fj−fanout_saved−combo_saved_fJ.

At 810, the current transaction counters are cleared and the currenttransaction statistics are reset based on the most recent outgoingactive clock edge detected. Resetting the current transaction statisticsat 810 may, for example, involve: recording the total CK active edgesdropped prior to the current ECK active edge (this contributes to powersavings by this clock gate); recording total ECK active edges nottransmitted to leaf clock gates when they were otherwise enabled (thiscontributes to power savings by this clock gate); recording total Eedges prior to the new ECK edge (this contributes to power cost of thisclock gate); and/or, if protocol applies, recording the number ofprotocol required ECK active edges prior to the current ECK active edge(this determines if the current edge is required by protocol and whetherprevious required edges were not delivered).

At 812, counters for the current transaction are incremented based onevents at the inputs of the clock gate monitor. Incrementing the currenttransaction counters at 812 continues as long as no new outgoing activeclock edge is detected (block 802 NO output). When a new outgoing activeclock edge is detected (block 802 YES output), the method 800 returns to804 and continues as discussed above.

The examples above include descriptions of ideal clock gating behavior.This is defined by the functional specification of the circuit design.Conventionally, comparing ideal design behavior to actual designbehavior is a functional verification exercise. The clock gate monitorembodiments of the present disclosure provide functional power analysisby allowing for comparison of the ideal clock gate behavior with actualclock gate behavior through dynamic, cycle-by-cycle, clock gate powermonitoring and processing.

In the preceding description, for purposes of explanation, numerousdetails are set forth in order to provide a thorough understanding ofthe embodiments. However, it will be apparent to one skilled in the artthat these specific details are not required. In other instances,well-known electrical structures and circuits are shown in block diagramform in order not to obscure the understanding. For example, specificdetails are not provided as to whether the embodiments described hereinare implemented as a software routine, hardware circuit, firmware, or acombination thereof.

Embodiments of the disclosure can be represented as a computer programproduct stored in a machine-readable medium (also referred to as acomputer-readable medium, a processor-readable medium, or a computerusable medium having a computer-readable program code embodied therein).The machine-readable medium can be any suitable tangible, non-transitorymedium, including magnetic, optical, or electrical storage mediumincluding a diskette, compact disk read only memory (CD-ROM), memorydevice (volatile or non-volatile), or similar storage mechanism. Themachine-readable medium can contain various sets of instructions, codesequences, configuration information, or other data, which, whenexecuted, cause a processor to perform steps in a method according to anembodiment of the disclosure. Those of ordinary skill in the art willappreciate that other instructions and operations necessary to implementthe described implementations can also be stored on the machine-readablemedium. The instructions stored on the machine-readable medium can beexecuted by a processor or other suitable processing device, and caninterface with circuitry to perform the described tasks.

The above-described embodiments are intended to be examples only.Alterations, modifications and variations can be effected to theparticular embodiments by those of skill in the art. The scope of theclaims should not be limited by the particular embodiments set forthherein, but should be construed in a manner consistent with thespecification as a whole.

What is claimed is:
 1. An apparatus for monitoring operation of a designunder test (DUT), the DUT comprising a plurality of combinational logicelements, a plurality of clocked sequential logic elements, and aplurality of clock gate elements connected to selectively provide clockedges to the clocked sequential logic elements, the apparatuscomprising: a plurality of inputs comprising: an incoming clock edgeinput connected to detect active clock edges provided to a monitoredclock gate of the plurality of clock gate elements of the DUT; anoutgoing clock edge input connected to detect active clock edges sentfrom the monitored clock gate; an enable input connected to detectenable signals provided to the monitored clock gate and any leaf clockgates of the plurality of clock gate elements of the DUT connected toreceive clock edges through the monitored clock gate; and a protocolinput connected to receive protocol signals specifying when themonitored clock gate is required to output active clock edges; a memoryin communication with the plurality of inputs for storing values fromthe plurality of inputs; and a processor in communication with thememory and the plurality of inputs, the processor programmed todetermine protocol compliance and to calculate energy consequences ofdropping of active clock edges at the monitored clock gate.
 2. Theapparatus of claim 1 wherein the plurality of inputs comprises: adata-in input connected to detect signals on D-pins of sequential logicelements within a fanout of the monitored clock gate, the fanout of themonitored clock gate comprising all of the clocked sequential logicelements connected to receive clock signals through the monitored clockgate, and the combinational logic elements that receive data from theclocked sequential logic elements connected to receive clock signalsthrough the monitored clock gate; and a data-out input connected todetect signals on Q-pins of sequential logic elements within the fanoutof the monitored clock gate, and wherein the processor is programmed tocalculate energy consumed in the fanout of the monitored clock gate. 3.The apparatus of claim 2 wherein the plurality of inputs comprises: anupstream clocking input connected to detect active clock edges outputfrom clock gates controlling sequential logic elements upstream from thesequential logic elements controlled by the monitored clock gate; and, adownstream clocking input connected to detect active clock edges outputto clock gates controlling sequential logic elements downstream from thesequential logic elements controlled by the monitored clock gate,wherein the processor is programmed to determine unnecessary activeclock edges sent from the monitored clock gate and calculate potentialenergy savings realizable through elimination of the unnecessary activeclock edges.
 4. The apparatus of claim 1 wherein the plurality of inputscomprises a timing input connected to receive a time window, and whereinthe processor is programmed to determine power savings due to droppingof active clock edges at the monitored clock gate for the time window.5. The apparatus of claim 2 wherein the plurality of inputs comprises atiming input connected to receive a time window, and wherein theprocessor is programmed to determine power savings due to dropping ofactive clock edges at the monitored clock gate, and power consumed inthe fanout of the monitored clock gate, for the time window.
 6. Theapparatus of claim 3 wherein the plurality of inputs comprises a timinginput connected to receive a time window, and wherein the processor isprogrammed to determine power savings due to dropping of active clockedges at the monitored clock gate, power consumed in the fanout of themonitored clock gate, and potential power savings realizable throughelimination of the unnecessary active clock edges for the time window.7. The apparatus of claim 1 wherein the processor is programmed toattribute energy which would have been consumed by any leaf clock gatesconnected to receive clock edges through the monitored clock gate asenergy saved due to dropping of an active clock edge at the monitoredclock gate when the enable input indicates that the leaf clock gateswere enabled when the active edge was dropped.
 8. The apparatus of claim2 wherein the processor is programmed to calculate energy consumed inthe fanout of the monitored clock gate based on a number of sequentiallogic elements that change signal levels on their Q-pins.
 9. Theapparatus of claim 3 wherein the processor is programmed to determinethat an unnecessary active clock edge is sent from the monitored clockgate when the upstream clocking input indicates that no active clockedge is sent to sequential logic elements upstream from the sequentiallogic elements controlled by the monitored clock gate.
 10. The apparatusof claim 3 wherein the processor is programmed to determine that anunnecessary active clock edge is sent from the monitored clock gate whenthe downstream clocking input indicates that no active clock edge issent to sequential logic elements downstream from the sequential logicelements controlled by the monitored clock gate.
 11. A method formonitoring operation of a design under test (DUT), the DUT comprising aplurality of combinational logic elements, a plurality of clockedsequential logic elements, and a plurality of clock gate elementsconnected to selectively provide clock edges to the clocked sequentiallogic elements, the method comprising: detecting active clock edgesprovided to a monitored clock gate of the plurality of clock gateelements of the DUT; detecting active clock edges sent from themonitored clock gate; detecting enable signals provided to the monitoredclock gate and any leaf clock gates of the plurality of clock gateelements of the DUT connected to receive clock edges through themonitored clock gate; receiving protocol signals specifying when themonitored clock gate is required to output active clock edges;determining protocol compliance by comparing the active clock edges sentfrom the monitored clock gate to a set of required edges specified bythe protocol signals; and calculating energy consequences of dropping ofactive clock edges at the monitored clock gate by comparing the activeclock edges provided to a monitored clock gate with the active clockedges sent from the monitored clock gate.
 12. The method of claim 11comprising attributing energy which would have been consumed by any leafclock gates connected to receive clock edges through the monitored clockgate as energy saved due to dropping of an active clock edge at themonitored clock gate when the enable signals indicate that the leafclock gates were enabled when the active edge was dropped.
 13. Themethod of claim 12 comprising: detecting signals on D-pins of sequentiallogic elements within a fanout of the monitored clock gate, the fanoutof the monitored clock gate comprising all of the clocked sequentiallogic elements connected to receive clock signals through the monitoredclock gate, and the combinational logic elements that receive data fromthe clocked sequential logic elements connected to receive clock signalsthrough the monitored clock gate; detecting signals on Q-pins ofsequential logic elements within the fanout of the monitored clock gate;and calculating energy consumed in the fanout of the monitored clockgate based on the detected signals on the D-pins and Q-pins.
 14. Themethod of claim 13 comprising calculating energy consumed in the fanoutof the monitored clock gate based on a number of sequential logicelements that change signal levels on their Q-pins.
 15. The method ofclaim 13 comprising: detecting active clock edges output from clockgates controlling sequential logic elements upstream from the sequentiallogic elements controlled by the monitored clock gate; detecting activeclock edges output to clock gates controlling sequential logic elementsdownstream from the sequential logic elements controlled by themonitored clock gate; and determining unnecessary active clock edgessent from the monitored clock gate and calculating potential energysavings realizable through elimination of the unnecessary active clockedges.
 16. The method of claim 15 comprising determining that anunnecessary active clock edge is sent from the monitored clock gate whenthe upstream clocking input indicates that no active clock edge is sentto sequential logic elements upstream from the sequential logic elementscontrolled by the monitored clock gate.
 17. The method of claim 15comprising: determine that an unnecessary active clock edge is sent fromthe monitored clock gate when the downstream clocking input indicatesthat no active clock edge is sent to sequential logic elementsdownstream from the sequential logic elements controlled by themonitored clock gate.
 18. The method of claim 15 comprising: receiving atime window; and determining power savings due to dropping of activeclock edges at the monitored clock gate, power consumed in the fanout ofthe monitored clock gate, and potential power savings realizable throughelimination of the unnecessary active clock edges for the time window.19. The method of claim 15 comprising: defining a new transaction ascommencing at each detected active clock edge sent from the monitoredclock gate; and, maintaining statistics for at least a currenttransaction and a previous transaction.
 20. The method of claim 19comprising, upon detecting each active clock edge sent from themonitored clock gate: replacing a set of previous transaction statisticswith a set of current transaction statistics; resolving energyconsequences of the previous transaction based on the set of previoustransaction statistics; and, resetting the set of current transactionstatistics based on data corresponding to a newest detected active clockedge sent from the monitored clock gate.